Modeling of Environmental Factors Affecting the Prevalence of Zoonotic and Anthroponotic Cutaneous, and Zoonotic Visceral Leishmaniasis in Foci of Iran: a Remote Sensing and GIS Based Study.

Background
Leishmaniasis is a re-emerging serious international public health problem, and both visceral and cutaneous types of leishmaniasis became important endemic diseases in Iran. In this study, the relationships between environmental factors (vegetation and elevation) and the prevalence of diseases have been investigated.


Methods
All international and national online databases were searched by terms such as leishmaniasis, incidence, prevalence and other related words attributed to Iran and published until first quarter of 2015. The developed database in Excel, later imported to the ArcMap for spatial analyst and mapping. Afterwards, the software was used for modeling the relationship between the prevalence/incidence and environmental variables (vegetation and elevation) by both linear and nonlinear regression.


Results
After mapping the prevalence data from 144 studies, considering non-parametric ANOVA, the tendency of zoonotic visceral leishmaniasis to presence in high elevation and high vegetation was more than Anthroponotic and zoonotic cutaneous leishmaniasis. While linear regression showed weaker results for modeling, however, additive nonparametric regression analysis suggested that 10km buffers for elevation, and 10 as well as 50km buffers for vegetation could contribute in better fitness in modeling of these variables.


Conclusion
The detailed maps for distribution of disease concluded. The nonlinear regression is a reliable predictor of the relationship between environmental factors and disease incidence, although more and wide researchers are needed to confirm it.


Introduction
Leishmaniasis -a re-emerging serious international public health problem-is considered as one of the most neglected tropical diseases. The disease is distributed in new foci due to influence of many risk factors, including environmental factors (1)(2)(3)(4). Almost 200000 to 400000 new cases of visceral leishmaniasis (VL), 0.7 million to 1.3 million new cases of cu-taneous leishmaniasis (CL) and 20000 to 30000 deaths annually" are the estimations of WHO last report about importance of the disease (2).
Two dominant types of leishmaniasis there in Iran are: zoonotic visceral or kala-azar (ZVL), the most serious and fatal type of the disease, and both types of cutaneous leishmaniasis namely urban form or anthroponotic cutaneous leishmaniasis (ACL) and rural form or zoonotic cutaneous leishmaniasis (ZCL). Leishmania donovani complex are the causes of ZVL protozoan disease which can lead to patient mortality (5). Evidence indicates an increase in the incidence of disease in the Old and New Worlds at the early years of the present century. In recent decades, ZVL became an important endemic disease in the Northwest (6), South and Southwest (7), and Northeast (8) of Iran, especially in nomads. The majority of ZVL cases (92.8%) in country were found among children with age under 12yr old (9). The most recently, direct agglutination test (DAT) as a simple, affordable and Practical test, has found its way as a special quick and accurate test. Thus it has been recommended and applied widely for clinical diagnosis and seroprevalence studies of ZVL in zoonosis and anthroponotic reservoirs in endemic and nonendemic areas of Iran (10). Due to ecologically dependence of the disease vectors and reservoirs to environment, outbreaks mostly are common in lowlands rural or suburban areas (less than 600M elevation), with a heavy annual rainfall, a moderate temperature (15-38 °C), and almost dense vegetation (2).
Countries of Mediterranean coasts (Algeria and Syrian Arab Republic), South Americas (Brazil and Colombia), and Middle East (Islamic Republic of Iran and Afghanistan) and Central Asia countries are the most reported foci for CL and the major contribution of CL occur in this area (2). The CL cases in Iran during the period of 2001 to 2008 dramatically has been increased more than two-fold and also over half of 31 provinces of country have CL endemic foci (11). As reported by Center for Disease Control, the annual incidence of leishmaniasis in the country due to referring to health centers has reached about 20000 cases. While the actual cases of disease may be far more than these reports that may even reach up to five times (12).
As for the first time carried by John Snow in 1854 due to diarrheal disease detection in London, spatial modeling as a useful tool in epidemiological studies could be used. Today, Geographic Information System (GIS) in relation to the processed images from remote sensing (RS) has been used to map the geographical distributions of disease prevalence, factors affecting the transmission and control of disease, and the spatial modeling of environmental factors that have meaningful impacts on disease occurrences (9, 13). Vector control programs are associated with logistic problems, mainly due to poor understanding of vector ecology (14). Expanding inexpensive and effective models of integrated management for better control of CL is a critical target which need a thorough will, alongside epidemiology studies and understanding of the ecology of the disease (11) and the role of social, ecological and specially environmental variables for risk factors such as the vegetation and elevation (two most widespread and accessible environmental variables). The relationship between epidemiological components of leishmaniasis (incidence, prevalence, and seroprevalence) and environmental variables (vegetation, elevation, and/or other factors) can be used in GIS modeling and may be useful for secondary uses including prediction. Therefore, contributing to better planning for control activities and the establishment of early warning systems, incorporating remotely sensed information in epidemiological studies can help public health policy makers in having a better notion and comprehensive understanding to find the relationships between disease and its environmental risk factors (15).
The remotely sensed Normalized Difference Vegetation Index (NDVI) -the predominantly index which used for vegetation coverage-has found its widespread applications due to fluctuations it has along with meteorological and environmental components (14). This index together with the other remotely sensed data (11, 16) like the Digital Elevation Model (DEM), have been used extensively in monitoring some vector-borne diseases in-cluding leishmaniasis throughout the world. While many studies have surveyed the associations between the mean of NDVI and DEM with incidence or prevalence rate of a type of leishmaniasis locally, however, they do not study all variables in a vast area.
In this study, we systematically reviewed all studies regarding the incidence and/or prevalence rate of all categories of leishmaniasis investigated in Iran from 1976 to 2015. After mapping the results, using Spatial Analysis and Regression techniques, statistical association between variables have been modeled in detail.

Materials and Methods
To identify the incidence and prevalence of leishmaniasis in endemic and nonendemic area of ZVL, ZCL, and ACL of Iran, we searched and reviewed the literature based on international database including MEDLINE, ISI, SCOPUS, and Google Scholar, and national databases such as SID, IRANDOC, MagIran, and ISC. Search protocols have been based on using the terms related to all types of leishmaniasis, fauna, vectors, reservoirs, incidence, prevalence, seroprevalence, Direct Agglutination Test (DAT), and their Persian equivalents with the geographic term "Iran".
Overall, 503 articles published from 1976 through Jun 2015 in English, French, and Persian were founded in first search from databases. Besides, by referring to CV of famous professors, abstract seminars, thesis, non-digital papers, and the reference/citations of articles, 121 additional sources were found. After the elimination of duplicate and inappropriate articles by two separate teams, we consequently included 144 studies that clearly addressed confirmed clinical cases for human seroprevalence (DAT ≥ 1: 3200)/ incidence of ZVL and active lesion prevalence/total prevalence (active lesion + scar)/incidence of ACL and ZCL in Iran. Disease data were extracted by researchers directly from body text of arti-cles to database tables (Dbase format using the MS Excel software) containing all cities (1246 point) and villages with a population more than 500 people (12034 points), disease information, and descriptive data. Depending on the raw data presented in each article, one or more than one category of epidemiologic data was included in the main table. These numeric values (all in percent) for ACL and ZCL including active lesion prevalence, total prevalence (active lesion + scar), and incidence and for VCL including seroprevalence and incidence were directly or indirectly extracted using study population. The developed database later imported to the ArcMap ® software version 9.3 (ESRI, Redlands, USA) software platform for Spatial Analyst and mapping all types of leishmaniasis. R software version 3.2.2 (17) was used for analyzing the relationship between the prevalence/incidence and continues spatial variables.
Shapefile maps for political subdivisions of provinces, counties, rural districts, and points of cities, towns, and villages with their attribute tables were obtained from National Cartographic Center, Iran. The Moderate Resolution Imaging Spectroradiometer (MODIS) onboard the Terra satellite, is one of most applied NDVI images source in literature. NDVI calculated from surface reflectance in the near-infrared ρNIR and red ρR portions of the electromagnetic spectrum, using the equation 1 ( and zonal statistics were performed on them. Overall, 5km, 10km, 20km, and 50km buffer zones centered on each city/village point reported values for leishmaniasis, and their zonal Min, Max, Minority, Majority, Median and Mean statistics were calculated and exported in Dbase tables. Finally, massive joined tables were called in R, and statistical calculations were completed for each.

Data analysis
CSV tables for any purposes derived from the main disease data table and called in R console. We chose to test the normality of data with One-sample Kolmogorov-Smirnov and Shapiro-Wilk normality test. Next, we modeled the relationships between prevalence/ incidence values and the continuous independent variables. Two regression scenarios were considered. First, using simple linear regression models, a stepwise backward elimination procedure was used with the variables, finally resulting in the model contains merely the variables which associated with the outcome. Secondly, using generalized additive models -which use spline or local regression-a stepwise backward elimination procedure, was used again to reach the final model. The additive nonparametric regression model is presented in equation 2 (21): 2) y= β0+ m1(x1)+ m2(x2)+ . . .+ mk(xk)+ ɛ Where the partial-regression functions mj are fit using a simple-regression smoother, such as local polynomial regression or smoothing splines. In nonparametric regression, in contrast, the object is to estimate the regression function directly without specifying its form explicitly. In nonparametric comparing with simple regression, the ultimate goal is to obtain the regression function directly from internal computation without any open aspect (22). The level of 5% significance for all tests was considered.

Results
We found 144 studies that directly informed the active lesion/scar prevalence and/or incidence of leishmaniasis in Iran from 1976-2015 (Table 1). These data considered as a base for construction of 89 ACL points including 15, 19, and 71 urban/rural points for active lesion, total prevalence (active lesion + scar), and incidence respectively. The same way, data for 420 ZCL points included 67, 68, and 362 points. Cases for 355 ZVL points for the seroprevalence and incidence were 283 and 82 respectively. Figures 1, 2, and 3 show the maps of three types of leishmaniasis in Iran separately. The DEM map and 15yr average of NDVI map also attached on VL and ACL maps respectively. Rating in the categories are High, Middle and Low for prevalence data and obtained based on expert consensus.
Scatter diagrams and normality plot of each independent variable and the disease incidence/ prevalence were created and visually inspected before running regression. Some of these plots traced in Fig. 4. Normal probability of data in our variables has been questioned, then, the normality of data statistically tested. Therefore, using both One-sample Kolmogorov-Smirnov and Shapiro-Wilk normality test, the data are without normality. However, both linear and nonparametric regression fitted on data.
To compare of the means of main independent variables (DEM and NDVI) versus types of leishmaniasis, homogeneity of variances rejected by Fligner-Killeen test, but due to nonparametric Kruskal-Wallis test, we result in significant difference between means of variables. Regard to boxplots for comparison of means in Fig. 5, and nonparametric analysis of variances in Table 2, tendency of ZVL to presence in high elevation and high vegetation is more than ACL and ZCL. Similarly, we compare the means of types of leishmaniasis occurrences and epidemiological variables of disease in Table 3. The regression models were evaluated using R software packages. Tables 4 and 5 generally descript data structure and assessment of the effect of, or relationship between, explanatory variables on the response by using multiple linear regression models.
To determine the best final fitted models for each group of data, based on P-value, adjusted R-squared, and AIC (Akaike's 'An Information Criterion'), using stepwise backward simple linear regression, following model produced: (although second order or interaction models had a little more R-squared, due to the complexity were excluded). The "s" function, used in specifying the model formula, indicates that each term is to be fitted with a smoothing spline. The surface that fitted on the model and smoothing splines have been shown in Fig. 7 and 8. In equation 5, the 5.43 parameters are used for the MEAN$NDVI10 (mean of 10km buffer for NDVI) term, and 7.98 for the MEAN$DEM10 term, the degrees of freedom for the model is the sum of these plus 1 for the regression constant. annual average Normalized Difference Vegetation Index map. High: active lesion ≥ 2% or active lesion+scar ≥ 20% or incidence ≥ 2%; Middle: 2% ˃ active lesion ≥ 0.5% or 20% ˃ active lesion+scar ≥ 5% or 2 ˃ incidence ≥ 0.5%, Low: 0.5% ˃ active lesion or 5% ˃ active lesion+scar or 0.5˃ incidence 46  (27)       High: active lesion ≥2% or active lesion+scar ≥20% or incidence ≥ 2%, Middle: 2% ˃ active lesion ≥ 0.5% or 20% ˃ active lesion+scar ≥ 5% or 2˃ incidence ≥ 0.5%, Low: 0.5% ˃ active lesion or 5% ˃ active lesion+scar or 0.5 ˃ incidence

Discussion
Maps based on the actual data for distribution of all leishmaniasis types in Iran, with identifying the detailed variables of environmental factors affecting the disease distribution, are the most important findings of this study. Due to the limited access to official leishmaniasis data of Center for Disease Control, integration all data from both CL types and rural and urban foci in the heart of the county official reports (lack of separation species), tens of kilometers distance between some foci and center of counties often identified as disease points (lack of spatial resolution), unknown types of CL in most of official data, and passive screening in Public Health Centers, we decided to use the real data available in the literatures with a detailed focus on disease points. Many of scientific epidemiological studies predominantly focus on foci of diseases and have very valuable and verified information about all aspects of 53 http://jad.tums.ac.ir Published Online: March 18, 2018 the disease, such as prevalence, incidence, vectors, reservoirs, and control of disease data. Maps that contain all of the scientific information investigate by researchers of a country over several decades could well represent the disease situation in that country. Although the nature of the leishmaniasis itself was not the aim of our study, however, distribution of disease to the breakdown mapping can present more accurate view of disease for researchers.
Though vegetation statistically has a significant relationship with the prevalence of vector-borne diseases, this study (especially for the simple linear regression models), with a weak confirmation, implicitly point to a complex and indirect relationship. Therefore, more detailed studies and comprehensive investigations would be needed to prove the theorem. A study on effect of minimum, maximum and mean NDVI on the distribution of the VL vector in district of Bihar, India showed that max and mean variables with an R 2 about 0.6 strongly associated with the disease (14). Werneck et al. with a mention to the NDVI, emphasized on the impact of multilevel variable modeling on the development of leishmaniasis in Brazil (154). Study of environmental factors influencing the distribution of vectors involved in transmission of disease in north-eastern Italy revealed that areas with high winter NDVI may be related to the survival of the larvae in moist soil (155). According to the results of relationship between CL and NDVI in northeast province of Iran, Golestan, arid and semiarid regions with low vegetation are the most involved area for CL occurrence (11). In our study, as expected due to their rural and nomadic natures of ZCL and ZVL in Iran, the values for NDVI index increase for ACL, ZCL, and ZVL respectively ( Table 2). There is not much variation among them, and all three types are almost inclined to intermediate vegetation (neither too sparse nor too dense). The distinction in the result of numerous studies would be aroused from differences between climate factors, socio-economic variables, in-strumental confounders and differences in the interpretation of the results.
Incidence of various types of leishmaniasis in different heights may result from the diversity in the reservoirs and vectors of the disease. Whereas ACL and ZVL had an interest for presence in high altitude areas, ZCL have been distributed in lower altitude ( Table 2). This result for ZVL has obtained also in Ilam Province spatial modeling, and elevation had a negative impact on CL prevalence (156). Multivariate analysis in a VL study in eastern Sudan found the elevation as important variables, while the primary analysis did not show any correlation with disease incidence (157). However, while DEM can use as a good variable for topographic purposes, it may do not has most appropriate function as elevation in all regions (155).
The role of environmental factors in the development of disease vectors is obvious. While the CL is the main vector-borne disease in Iran, about 80% of cases are zoonotic and spread in 17 out of 31 provinces (158). Some studies have been modeled the environmental factors for vectors and reservoirs of leishmaniasis. Altitude and land cover was found to be from most important factors for suitability of niches for sand flies. Areas with 990m were found and 1235m average altitude were related to probability of presence of ZCL and ACL vectors respectively (159). Niche modeling of main reservoir hosts in Iran showed that among topography variables, slope has the most contribution in forecasting risk of disease (160). A predictive degree-day model was used for development time, population dynamics and activities of VL vector sandflies in field in northwest Iran (161). Among climate factors, minimum of temperature, mean of humidity, and rainfall had the most impact on ZVL distribution in Golestan, north-east of Iran (162). None of these studies include NDVI as a variable.
The vegetation coverage has been discussed as a risk factor for VL and CL types in previous studies (163,164). The NDVI has not considered for intrinsic specifications itself but also it covers some other important environmental and ecological variables such as soil types and moisture, humidity, slope degree and even elevation of its ambient land (16). The majority of studies, evaluated the presence/absence of vegetation or have used a cut-off value point for NDVI, but less attention has been paid to the sheer numerical modeling. Linear regression results in this study show that none of subcategories of independent variables have a favorable associated with DEM and NDVI and do not fit on the linear model very well (low R-squared). Only some of models when focused on leishmaniasis types, the R-squared reach to 0.2-0.4. This fact could be predicted from plots associated with variables (Fig. 4). Despite the lack of goodness of fitness of simple linear models, Additive Nonparametric Regression analysis presents a better prediction model for data of this study. These models do not output real intercepts for variables, but using their internal formulas, they are able to provide predictive models for input data. As a nonparametric regression attributes, no obvious estimates were seen for parameters (21), and then to figure out regression results, regards to necessitate of using fitted regression graphics, we used this method as shown in Fig. 7. Data frame used to find predicted values on the regression surface, fitted to data of model to drawn this figure.
For more clarification, functions which explained previously that present in inherent content of the additive regression, have been shown in partial regression of our model in Fig. 8.

Conclusion
Although the results of modeling and predictive power of the models in this study was not great but was somewhat satisfied. Modeling environmental factors which affecting ecological disease, need attention to the role of all these factors together. Moreover, integration the scenarios such as finding hotspots in GIS, using statistical logistic regression, more specific factors of diseases such as vector and reservoir species, can be used in modeling the relationships, more meaningful and more clear. However, a good model needs to consider all the factors involved the prevalence and incidence of a disease.